Skip to content

Dev#839

Closed
IamMuhammadZeeshan wants to merge 5 commits into
mainfrom
dev
Closed

Dev#839
IamMuhammadZeeshan wants to merge 5 commits into
mainfrom
dev

Conversation

@IamMuhammadZeeshan

Copy link
Copy Markdown
Member

Summary

Related To

Description

Additional Notes (Optional)

Qazalbash and others added 5 commits May 31, 2026 05:43
…ions (#836)

* feat: implement HDF5 support for saving inference data and configurations

* fix: convert HDF5 dataset attributes to a dictionary

* Add report generation functionality with Papermill and Jupyter Notebook template

- Introduced `generate_report.py` to handle report generation from input data files.
- Integrated Papermill for executing Jupyter Notebook templates and generating HTML reports.
- Added a new Jupyter Notebook template `template_report.ipynb` for report formatting.
- Updated `pyproject.toml` to include new dependencies: `nbconvert`, `papermill`, and `plotly`.
- Registered new command line entry point for report generation in `pyproject.toml`.
- Included the notebook template in package data for distribution.

* Update src/gwkokab/analysis/report/generate_report.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* style: format output notebook path assignment

* feat: add corner library dependency for enhanced plotting capabilities

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix: ensure compatibility with JAX Array in HDF5 write function

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* feat: using file descriptor to reduce IO overhead

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
… and `SubPopulationModelCore` (#837)

* refactor: rename parameters for consistency in `MultiSourceModelCore` and `SubPopulationModelCore`

* Update src/gwkokab/analysis/multisource/common.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* fix: reorder parameters in `MultiSourceModelCore`

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
* feat: utils to repack and replace `.h5` and `.hdf5` files

* fix: remove bugs from option parsing and update epilog
…#833)

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request transitions the storage of inference results, configurations, and metadata from plain text and JSON files to a unified HDF5 format (inference_data.hdf5), updating both FlowMC and NumPyro samplers accordingly. It also introduces an automated HTML report generation tool using a Jupyter notebook template, standardizes parameter naming across hybrid models, and adds an HDF5 repacking script. The review feedback highlights several critical issues: a potential crash in generate_marginal_probs when inspecting classes without an explicit __init__, a bug in write_to_hdf5 where JAX Arrays are not converted to NumPy arrays during dataset overwrites, an argument parsing error in h5repack.py that prevents the file argument from being matched, and a lack of robustness in the report notebook when certain datasets are missing from the HDF5 file.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +201 to +206
if data is not None:
if dataset_path in f:
del f[dataset_path]
elif isinstance(data, Array):
data = np.asarray(data)
obj = f.create_dataset(dataset_path, data=data)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In write_to_hdf5, the elif isinstance(data, Array): block is skipped if dataset_path in f is True. This means JAX Arrays are not converted to NumPy arrays when overwriting an existing dataset, which can cause h5py to fail or behave unexpectedly. Changing elif to if ensures the conversion always runs.

Suggested change
if data is not None:
if dataset_path in f:
del f[dataset_path]
elif isinstance(data, Array):
data = np.asarray(data)
obj = f.create_dataset(dataset_path, data=data)
if data is not None:
if dataset_path in f:
del f[dataset_path]
if isinstance(data, Array):
data = np.asarray(data)
obj = f.create_dataset(dataset_path, data=data)

Comment on lines +380 to +381
param_names = list(inspect.signature(model_meta_cls.__init__).parameters.keys()) # type: ignore
param_names.remove("self")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using inspect.signature(model_meta_cls.__init__) and then param_names.remove("self") will crash if model_meta_cls does not explicitly define __init__ (like EccentricityMattersCore), because object.__init__ has no self parameter in its signature (or raises ValueError). Using inspect.signature(model_meta_cls) is the standard, safe way to get the constructor signature without self and works correctly for all classes.

Suggested change
param_names = list(inspect.signature(model_meta_cls.__init__).parameters.keys()) # type: ignore
param_names.remove("self")
param_names = list(inspect.signature(model_meta_cls).parameters.keys())

Comment on lines +36 to +42
parser.add_argument(
"options",
nargs=argparse.REMAINDER,
help="Additional options to pass to h5repack (e.g., '-f GZIP=9 -f SHUF')",
)
parser.add_argument("file", help="Path to the HDF5 file to repack")
args, options = parser.parse_known_args()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Defining a positional argument "options" with nargs=argparse.REMAINDER before "file" causes "options" to consume all remaining arguments (including the file path), leaving "file" unmatched and causing the script to always fail with a required argument error. Removing the "options" argument from the parser allows parse_known_args() to correctly capture all unrecognized options in the options list while successfully parsing "file".

    parser.add_argument("file", help="Path to the HDF5 file to repack")
    args, options = parser.parse_known_args()

"outputs": [],
"source": [
"if SAMPLER_NAME == \"flowMC\":\n",
" global_acc_train = read_from_hdf5(inference_data_file, \"/acceptances/global/train\")\n",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The notebook assumes that all training/production datasets (like /acceptances/global/train, loss, /chains/train/...) are always present in the HDF5 file. If any phase was skipped or had no data, these datasets won't exist, causing read_from_hdf5 to raise a ValueError and crash the entire automated report generation. Consider checking if the datasets exist in the file before reading them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants